Sequence-to-Sequence Acoustic Modeling for Voice Conversion

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical sequence-to-frame mapping techniques for voice conversion

あらまし 話者変換の目的はある話者の声を別の話者の声に変換することである。これは二つの話者区間において音 声時系列のマッピング関数を求めることとして考えられる。GMMを用いた統計的マッピング方法 [1], [2]は話者変換 のタスクにおいてよく使われている。ただし、GMMを用いた変換技術はフレームからフレームへのマッピング関数を 使用しているので、音声時系列のコンテキスト情報が十分には使われていない。HMMは音声時系列の有効なモデル であり、音声認識や音声合成においてよく使われている。本研究は HMMを用いた音声変換を研究対象とする。我々 は HMMを用いた回帰、シーケンスからフレームの変換関数を導出した。先行の HMMを用いた音声変換方法 [3]~ [5]は強制切り出し (forced alignment)によって音声を分割し、各区間に対して変換を行う。それらの方法と異なって, 我...

متن کامل

Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks

We propose a training framework for sequence-to-sequence voice conversion (SVC). A well-known problem regarding a conventional VC framework is that acoustic-feature sequences generated from a converter tend to be over-smoothed, resulting in buzzy-sounding speech. This is because a particular form of similarity metric or distribution for parameter training of the acoustic model is assumed so tha...

متن کامل

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

Voice conversion (VC) using sequence-to-sequence learning of context posterior probabilities is proposed. Conventional VC using shared context posterior probabilities predicts target speech parameters from the context posterior probabilities estimated from the source speech parameters. Although conventional VC can be built from non-parallel data, it is difficult to convert speaker individuality...

متن کامل

Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion

Recently, neural sequence-to-sequence (Seq2Seq) models have been applied to the problem of grapheme-to-phoneme (G2P) conversion. These models offer a straightforward way of modeling the conversion by jointly learning the alignment and translation of input to output tokens in an end-to-end fashion. However, until now this approach did not show improved error rates on its own compared to traditio...

متن کامل

High-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion

This paper presents a voice conversion (VC) method that utilizes recently proposed recurrent temporal restricted Boltzmann machines (RTRBMs) for each speaker, with the goal of capturing high-order temporal dependencies in an acoustic sequence. Our algorithm starts from the separate training of two RTRBMs for a source and target speaker using speaker-dependent training data. Since each RTRBM att...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM Transactions on Audio, Speech, and Language Processing

سال: 2019

ISSN: 2329-9290,2329-9304

DOI: 10.1109/taslp.2019.2892235